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Abstract. Due to computational and storage efficiencies of compact bi¬ 
nary codes, hashing has been widely used for large-scale similarity search. 
Unfortunately, many existing hashing methods based on observed key¬ 
word features are not effective for short texts due to the sparseness and 
shortness. Recently, some researchers try to utilize latent topics of certain 
granularity to preserve semantic similarity in hash codes beyond key¬ 
word matching. However, topics of certain granularity are not adequate 
to represent the intrinsic semantic information. In this paper, we present 
a novel unified approach for short text Hashing using Multi-granularity 
Topics and Tags, dubbed HMTT. In particular, we propose a selection 
method to choose the optimal multi-granularity topics depending on the 
type of dataset, and design two distinct hashing strategies to incorporate 
multi-granularity topics. We also propose a simple and effective method 
to exploit tags to enhance the similarity of related texts. We carry out 
extensive experiments on one short text dataset as well as on one normal 
text dataset. The results demonstrate that our approach is effective and 
significantly outperforms baselines on several evaluation metrics. 
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1 Introduction 

With the explosion of social media, numerous short texts become available in 
a variety of genres, e.g. tweets, instant messages, questions in Question and 
Answer (Q&A) websites and online advertisements [5.. In order to conduct 
fast similarity search in those massive datasets, hashing, which tries to learn 
similarity-preserving binary codes for document representation, has been widely 
used to accelerate similarity search. Unfortunately, many existing hashing meth¬ 
ods based on keyword feature space usually fail to fully preserve the semantic 
similarity of short texts due to the sparseness of the original feature space. For 
example, there are three short texts as follows: 
dl: 11 Rafael Nadal missed the Australian Open !'’; 
d2: “Roger Federer won Grand Slam title 
d3: “ Tiger Woods broke numerous golf records”. 



Obviously, the hashing methods based on keyword space cannot see the simi¬ 
larity among dl, d2 and d3. In recent years, some researchers seek to address the 
challenge by latent semantic approach. For example, Wang et al. [T2] preserve 
the semantic similarity of documents in hash codes by fitting the topic distri¬ 
butions, and Xu et al. m directly treat the latent topic features as tokens to 
represent one document for hashing learning. However, topics of certain gran¬ 
ularity are not adequate to represent the intrinsic semantic information [J. As 
we know, different topic models with pre-defmed number of topics can extract 
different semantic level topics. For example, the topic model with a large num¬ 
ber of topics can extract more fine grained topic features, such as “Tennis Open 
Progress” for dl and g? 2, and “Golf Star News” for d3, but fail to construct the 
semantic relevance of g? 3 with the other texts, and the topic model with a few 
topics can extract more coarse grained semantic features, such as “Sport” and 
“Star” for dl, d2 and d3, but lack distinguishing information and cannot learn 
the hashing function effectively, As a reasonable assumption, multi-granularity 
topics are more suitable to preserve semantic similarity and learn hashing func¬ 
tion for short text hashing. 

On the other hand, tags are not fully utilized in many hashing methods. Actu¬ 
ally, in various real-world applications, documents are often associated with mul¬ 
tiple tags, which provide useful knowledge in learning effective hash codes |12j . 
For instance, in Q&A websites, each question has category labels or related tags 
assigned by its questioner. Another example is microblog, some tweets are labeled 
by their authors with hashtags in the form of “^keyword”. Thus, we should fully 
exploit the information contained in tags to strengthen the semantic relationship 
of related texts for hashing learning. 

Based on the above observations, this paper proposes a unified short text 
Hashing using Multi-granularity Topics and Tags , referred as HMTT for simplic¬ 
ity. In HMTT, two different ways are introduced to incorporate multi-granularity 
topics and tag information for improving short text hashing. 

The main contributions of this paper are three-fold: Firstly, a novel unified 
short text hashing is proposed. To our best knowledge, this is the first time of 
incorporating multi-granularity topics and tags into a unified hashing approach, 
and experiments are conducted to verify our assumption that short text hash¬ 
ing can be improved by integrating multi-granularity topics and tags. Secondly, 
the optimal multi-granularity topics can be selected automatically, i.e., to ex¬ 
tract effective latent topic features for hashing learning. The experimental results 
indicate the optimal multi-granularity topics can achieve better performances, 
compared with other multi-granularity topics. Finally, two strategies to incorpo¬ 
rate multi-granularity topics for short text hashing are designed and compared 
through extensive experimental evaluations and analyses. 


2 Related Work 

Hash-based methods can be mainly divided into two categories. One category 
is data-oblivious hashing. As the most popular hashing technique, Locality- 



Offline 

Learning 


Online 

Predicting 


Fig. 1. The proposed approach HMTT for short text hashing 


Sensitive Hashing (LSH) jTj based on random projection has been widely used 
for similarity search. However, since they are not aware of data distribution, 
those methods may lead to generate quite inefficient hash codes in practice [16] . 
Recently, more researchers focus attention on the other category, data-aware 
hashing, For example, the Spectral Hashing (SpH) [13] generates compact bi¬ 
nary codes by forcing the balanced and uncorrelated constraints into the learned 
codes. Self-Taught Hashing (STH) [T8] and Two Step Hashing (TSH) [9] decom¬ 
pose the learning procedure into two steps: generating binary code and learning 
hash function, and a supervised version of STH is proposed in m denoted 
as STHs. However, the previous hashing methods, directly working in keyword 
feature space, usually fail to fully preserve semantic similarity. More recently, 
Wang et al. m proposed a Semantic Hashing using Tags and Topic Modeling 
(SHTTM). However, the limitations of SHTTM are that: Although the topic 
distributions are used to preserve the content similarity to generate hash codes, 
they do not utilize the topics to improve hashing function learning; Even the 
number of topics must keep consistent with dimensions of hash code, that this 
assumption is too strict to capture the optimal semantic features for different 
types of datasets. 

3 Algorithm Description 

A unified short text hashing approach HMTT is depicted in Fig. [T| Given a 
dataset of n training texts denoted as: X = {xi,X2, ...,x„} £ R dx ", where d is 
the dimensionality of the keyword feature. Denote their tags as: t = {t i, t 2 ,..., t„} £ 
{0,l} 9Xn , where q is the total number of possible tags associated with each 
text. A tag with label 1 means a text is associated with a certain tag/category, 
while a tag with label 0 means a missing tag or the text is not associated 
with that tag/category. The goal of HMTT is to obtain optimal binary codes 
Y = {yi,y 2 ,...,y n } T S {-l,l} nxZ , and a hashing function /: R d ->• {-1, 1} Z , 
which embeds the query text x 9 to its binary vector representation y q with l 
bits. To achieve the similarity-preserving property, we require the similar texts 
to have similar binary codes in Hamming space. We first select the optimal 
topic models from the candidate topic models, and extract the multi-granularity 













































Algorithm 1 The Optimal Topics Selection 

Input: n training texts X = {xi, X 2 , x n } with tags t = {ti, t 2 , t n }, N candidate 
topic sets T = {Ti,T 2 , ...,Tjv} and a specified number M. 

Output: The optimal topic sets O, and the weight vector fi. 

1: Sample a sub-set X with tags t; Initialize fi <— 0, and O <— 0; 

2: for each text x £ X do 
3: Find nn + (x) and rm“(x); 

4: for i t— 1 to N do 

5: Update/r(Ti) by Eq. [I] 

6: end for 

7: end for 

8: while size( O) < M do 

9: T (p) = argmaxT iS T /z(Ti); Update O = O U {T (p) }, T = T - {T (p) }; 

10: end while 

11: return O and /x; 


topic features {0 1 , 0m}- Then the binary codes and hash functions can be 
learned by integrating multi-granularity topic features and tags. In the second 
phase which is online, the query text is represented by binary code mapped from 
the derived hash function, and then the approximate nearest neighbor search is 
accomplished in Hamming space. All pairs of hash code found within a certain 
Hamming distance of each other are semantic similar texts. 

The main challenges of the idea are that: (1). How to select the optimal 
topic models; (2). How to utilize the tag information efficiently; and (3). How 
to integrate the multi-granularity topics to preserve semantic similarity. The 
proposed approach HMTT will be described in detail in the following sections. 

3.1 Estimate and Select the Optimal Topics 

In this work, we straightforwardly obtain a set of candidate topics by pre-defining 
several different topic numbers of Latent Dirichlet Allocation (LDA) [3] - After 
training the topic models, we can draw multi-granularity topic features, cor¬ 
responding as distributions over the topics, from the candidate topic models. 

In order to select the optimal topic models, we should utilize the tag in¬ 
formation to evaluate the quality of topics. Inspired by m, the selection of 
optimal topic model sets depends on their capability in helping discriminate 
short texts without sharing any common tags. We denote N different sets of 
topics as T = {Ti, T 2 ,..., Tn}. For each entry T, the probability topics distri¬ 
butions over documents are denoted as 6 = p(z|x). The weight vector is /x = 
//(Xi),..., /i(Tv)}, where /x(T,) is the weight indicating the importance of 
topic set. The purpose is to select the optimal topic sets O = {Ti, T 2 ,..., Tm}- 
In 0], Chen et al. evaluate the quality of topics based on two aspects: discrim¬ 
ination and complementarity of the multi-granularity topics. However, how to 
balance those two aspects is a tricky problem and the latter aspect, comple¬ 
mentarity, is easy to introduce noises for preserving similarity. Thus, we propose 






a simple and effective method directly based on the key idea of Relief [7] as 
follows: Firstly, a sub-set X = {xy,x 2 ,..., x m } with tags t = {t i , t 2 ,t m } is 
sampled from training dataset, and we find two groups of k nearest neighbors 
for each text X*: one group is from the texts sharing any common tags (denoted 
as nn + (x)), and the other from the texts not sharing any common tags (denoted 
as nn~(x)). Then the weight is updated as follows: 


= n(Ti) + E 

3 =1 


D K l(T i (x) ,Ti(nn. (x))) 


A D K L(Ti(x),Ti(nn+(x))) 
k 

p= i 


(i) 


where, Dkl is the symmetric Kullback-Leibler (KL) divergence: 

D kl (2i(x), Tj (nnj (x))) = ± E (p(*fcI*) ' ( x)) ) 

+p(Zfc|7l7lJ (x)) • 

so is the value of Z3/fi(T i (x),T i (7i7i+(x))). After updating the weight vector, we 
directly select the optimal topic sets O according to the top -M weight values. 
In summary, the optimal topics selection procedure is depicted in Algorithm [1] 


3.2 Content Similarity and Tags Preservation 

In hashing problem, one key component is how to define the affinity matrix S. 
Diverse approaches can be applied to construct the similarity matrix. In this 
paper, we choose cosine function as an example and use the local similarity 
structure of all text pairs to reconstruct the similarity function as follows: 


Sij = 


o, 


■ll x J 


j-, */Xj £ NNfc(xj) or vice versa 
otherwise 


( 2 ) 


where NNt(x) represents the set of fc-nearest-neighbors of x, and Cj 3 is an 
confidence coefficient. If two documents x, and x 3 share any common tag, we 
set Ci 3 a higher value a. In reverse, the c i? is given a lower value b if two documents 
x,; and Xj are not related. The parameters a and b satisfy 1 > a > b > 0. For 
a particular dataset, the more trustworthy the tags are, the greater difference 
between a and b we set. In our experiments, we set a = 1 and b = 0.1. 


3.3 Learning to Hash with Multi-Level Topics 

Below, from different perspectives, we propose two strategies to integrate multi¬ 
granularity topics for improving short text hashing. 

Feature-Level Fusion In order to integrate multi-granularity topics, we here 
adopt a simple but powerful way to combine observed features and latent features 
for short text, similar as m and |3], and create a high dimensional vector H as: 


H = \fi.i9i,p, 2 0 2 , P-mOm], 


(3) 







Algorithm 2 Feature-Level Fusion Procedure 

Input: A set of n training texts X with tags t, M optimal topic models O associated 
with their weight vector p. 

Output: The optimal hash codes Y and the hash function: l linear SVM classifiers. 
1: Extract M topic feature sets {0 1 , 02, 0m} from the optimal topic models O; 

2: Produce the new feature fl by Eq. [3] and construct confidence matrix S by Eq.[2] 
3: Obtain the /-dimensional vectors Y by optimizing Eq. 0 
4: Generate Y by thresholding Y to the median vector m = median( Y); 

5: Train l linear SVM classifiers by the learned codes Y; 

6: return Hash codes Y and l linear SVM; 


where, {0i, 02 , 0m} are the optimal topic features, and 


pi = LLi(Ti)/min Tk £o{Hk(T k )). ( 4 ) 

We can straightforwardly construct the similarity matrix S by Eq.[2]with the 
new features of training texts. Similar as Two-Step Hashing (TSH) [5], we see 
the binary code generation and hash function learning process as two separate 
steps. As a special example, Laplacian affinity loss and linear SVM are chosen 
to solve our problem. In first step, the training hash codes procedure can be 
formulated as following optimization: 

" 2 

min 

* i,j =1 

s.t.Y G {-1, l} nxl 7 Y t 1 = 0, Y T Y = I 

where Sij is the pairwise similarity between documents x, and Xj , y, is the hash 
code for Xj, and ||-|| F is the Frobenius norm. To satisfy the similarity preserva¬ 
tion, we seeks to minimize the quantity, because it incurs a heavy penalty if two 
similar documents are mapped far away. The problem is relaxed by discarding 
Y G {—1, l} nxi , the optimal /-dimensional real-valued vector Y can be obtained 
by solving Laplacian Eigenmaps problem 2j. Then, Y can be converted into bi¬ 
nary codes Y via the media vector m = median(Y). In hash function learning 
step, thinking of each bit y G {+1, —1} in the binary code as a binary class 
label for that text, we can train / linear SVM classifiers /(x) = sgn( W T x) to 
predict the /-bit binary code for any query document x q . Algorithm [2] shows the 
procedure of this strategy. 

Decision-Level Fusion From another perspective, we can treat the optimal 
multi-granularity topic feature sets (0i, 02, 0 m} extracted from short texts as 

multi-view features. In our situation, there are M-view features: (0 1 , 02, ..., 0m}- 
We take a linear sum of those M-view similarities as follows: 

M n 

J2 J2 s i? Ily< — yjllj’ 

k =1 i,j =1 


(6) 






Algorithm 3 Decision-Level Fusion Procedure 

Input: A set of n training texts X with tags t, M optimal topic models O and 
trade-off parameters, C\ and C 2 . 

Output: The optimal hash codes Y and a set of linear hash function matrices W. 

1: Extract M topic feature sets {0 1 , 0 2 , 0m} from the optimal topic models O; 

2: Construct a series of confidence matrices {S (1 \ S^ M ^} by Eq.[2]for M fea¬ 

ture sets: {0i, 02, ..., Om}', 

3: Obtain the Z-dimensional vectors Y and W by optimizing Eq. [3 
4: Generate Y by thresholding Y to the median vector m = median^ Y); 

5: return Hash codes Y and hash function matrix set W; 


where, constructed as Eq. [2] is the affinity matrix defined on the fc-th view 
features. By introducing a diagonal n x n matrix whose entries are given 

... ... M 

by D u = E ?=1 M i Eq. [5] can be rewritten as tr(Y T E (D^ — SW)Y) = 

k 1 

M 

tr(Y T E l/ fc )Y), where is the Laplacian matrix defined on the k -th view 
fc=l 

features. By introducing Composite Hashing with Multiple Information Sources 
(CHMIS) US], as a representative of Multiple View Hashing (MVH), we can 
simultaneously learn the hash codes Y of the training texts X as well as a set 
of linear hash functions Y2kLi to infer the hash code for query 

text x q . The overall objective function is given as follows: 


min 

Y.W.a 

s.t.Y 


M 


CMY T E L (fe) Y) + C 2 


Y - E a fc (W( fc ))XW 

G {-1,1}” xk ,Y t 1 = 0, Y t Y = I,a T l= l,a > 0 


k= 1 
i nxk 


M 


M 9 

+ E ||w( fc )||^ 

p k—1 


(7) 

where, C\ and C 2 are trade-off parameters, tr(-) is the matrix trace function, 
a = {ot \, 0 : 2 ,ccm] is a combination coefficient vector to balance the out¬ 
puts from each view features, and a series of linear hash function matrices: 
W = (aiWi 1 ),a 2 Wi 2 i, ...,ai/Wi M i}. In order to solve this hard optimization 
problem, we first relax the discrete constraints Y G { — 1,1}” X *, and iteratively 
optimize one variable with the other two fixed. More detailed optimization pro¬ 
cedures of this method can be found in m- Different from the former strategy, 
we do not need to pre-allocate the weight value of each view features, because 
that the combination coefficient vector a = [aq, a 2 ,..., aju] learned iteratively in 
the process of optimization can balance the outputs of each view features, and 
the procedure of this strategy is shown in Algorithm [3] 


3.4 Complexity Analysis 

The training processes including binary code learning and hash function training 
are always conducted off-line. Thus, our focus of efficiency is on the prediction 
process. This process of generating hash code for a query text only involves some 










Gibbs sampling iterations to extract multi-granularity topics {0i, <? 2 ,..., 9m} 
and dot products in hash function y = sgn( W T x), which can be done in 
0(rKs + IK). Here, r is the number of Gibbs sampling iterations for topic 
inference, K is the sum of multi-granularity topic numbers {K\, K 2 ,..., Km}, l 
is the dimensionality of hash code and s denotes the sparsity of the observed 
keyword features. The values of the parameters above can be regarded as quite 
small constants. For example, r = 20, K « 100, l < 64 and the average number 
of sparsity per document s is no more than 100 in our experimental datasets. 
We can see the major time complexity is the Gibbs sampling for topic inference. 
In recent works, lots of studies focus to accelerate the topic inference. For exam¬ 
ple, in Biterm Topic Model (BTM), [5] gives a simplicity and efficient method 
without Gibbs sampling iterations and the time complexity for topic inference 
can be reduced to 0{Kb ), where b is the number of biterms in a query text. 

4 Experiment and Analysis 

4.1 Dataset and Experimental Settings 

We carried out extensive experiments on two publicly available real-world text 
datasets: one is typical short text dataset, Search Snippet^, and another is nor¬ 
mal text dataset, 20Newsgroup 

The Search Snippets dataset collected by Phan mu was selected from the 
results of web search transaction using predefined phrases of 8 different domains. 
We further filter the stop words and stem the texts. 20139 distinct words, 10059 
training texts and 2279 test texts are left, and the average text length is 17.1. 

The 20Newsgroups corpus was collected by Lang [8]. We use the popu¬ 
lar ‘bydate‘ version which contains 20 categories, 26214 distinct words, 11314 
training texts and 7532 test texts, and the average text length is 136.7. 

For these datasets, we denote the category labels as tags. For Search Snippets , 
we use a large-scale corpus m crawled from Wikipedia to estimate the topic 
models, and the original keyword features are directly used for learning the can¬ 
didate topic models for 20Newsgroups due to the sufficient keyword features. In 
order to evaluate our method’s performance, we compute standard retrieval per¬ 
formance measures: recall and precision, by using each document in the test set 
as a query to retrieve documents in the training set within a specified Hamming 
distance. For the original keyword feature space cannot well reflect the semantic 
similarity of documents, even worse for short text, we simply test if the two doc¬ 
uments share any common tag to decide whether a semantic similar text. This 
methodology is used in SH [TTj, STH jTB] , CHMIS [75] and SHTTM [12] , 

Five alternative hashing methods compared with our proposed approach are 
STHs [IB], STH [TB] , LCH [T7], LSI [IT] and SpH [13 . The results of all baseline 
methods are obtained by the open-source implementation provided on their cor¬ 
responding author’s homepage. In order to distinguish the proposed two strate- 

1 http://jwebpro.sourceforge.net/data-web-snippets.tar.gz 

2 http://people.csail.mit.edu/jrennie/20Newsgroups/ 
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Fig. 2. Precision-Recall curves of retrieved examples within Hamming radius 3 on two 
datasets with different hashing bits (4:4:64 bits). 

gies in our approach, the feature level fusion method is denoted as HMTT-Fea, 
and the decision level fusion method is named as HMTT- Deci. 

In our experiments, the candidate topic sets T = {T10, T30, T50, T70, 
T90, T120, T150} and the number of the optimal topic sets is fixed to 3. The 
parameters C i and C 2 in Eq. [3 are tuned from {0.1, 1, 10, 100}. The number of 
nearest neighbors is fixed to 25 when constructing the graph Laplacians in our 
approach, as well as in the baseline methods, STHs and STH. We evaluate the 
performance of different methods by varying the number of hashing bits from 
4 to 64. For LDA, we used the open-source implementation GibbsLDA0, and 
the hyper-parameters are tuned as a = 0.5, /? = 0.01, 1000 iterations of Gibbs 
sampling for learning, and 20 iterations for topic inference. The results reported 
are the average over 5 runs. 


4.2 Results and Analysis 

We sample 100 texts for each category with tags information randomly from 
training dataset and set k in Eq. [T] to 10 to evaluate the quality of topic sets 
by Algorithm |TJ As the number of optimal topic sets is fixed to 3, we get the 
optimal topic sets O = {T10, T30, T50} for both two datasets coincidentally, 
and the weight vectors /t = {3.44, 1.7, 1} for Search Snippets and /t = {1-31, 
1.22, 1} for 20Newsgroups. It is noteworthy that the weight values of the topic 
sets are affected by both the type of dataset and the settings of LDA. Below, a 
series of experiments are conducted to answer the questions: (1). How does the 
proposed approach HMTT compare with other baseline methods; (2). Whether 
the optimal multi-granularity topics can outperform single-granularity topics 
and other multi-granularity topics; (3). Which approach of the two strategies to 
integrate multi-granularity topics can achieve a better performance. 

3 https://github.com/jacoxu/short-text-hashing-HMTT, 
http: / / www.CICLing.org/2015/data/148 

4 http://jgibblda.sourceforge.net/ 













Table 1. Mean precision (mP) of the top 200 examples and the retrieved examples 
within Hamming radius 3 on SearchSnippets with 8 and 16 hashing bits. e.g. 10-30- 
50* means that the proposed methods incorporate the optimal multi-granularity topics, 
and 10-30-50W1 means that hashing method uses the multi-granularity topic sets {2T0, 
T30, T50} while fixing the balance values to 1:1:1. 


mP@Top 200 mP@Hamming Radius 3 


Methods 

HMTT-Fea 

HMTT-Dec 

HMTT-Fea 

HMTT-Dec 

Code Length 

8 bits 

16 bits 

8 bits 

16 bits 

8 bits 

16 bits 

8 bits 

16 bits 

10-30-50* 

0.829 

0.799 

0.826 

0.782 

0.411 

0.802 

0.403 

0.778 

10-70-90 

0.819 

0.800 

0.797 

0.762 

0.375 

0.789 

0.328 

0.754 

30-90-150 

0.802 

0.787 

0.801 

0.755 

0.393 

0.777 

0.382 

0.757 

10-30 

0.810 

0.789 

0.776 

0.757 

0.382 

0.776 

0.374 

0.744 

10-50 

0.813 

0.788 

0.772 

0.752 

0.383 

0.790 

0.334 

0.740 

30-50 

0.806 

0.796 

0.805 

0.777 

0.393 

0.779 

0.369 

0.764 

10-30-50W1 

0.811 

0.780 

0.822 

0.778 

0.368 

0.761 

0.398 

0.774 

10 

0.627 

0.624 

0.639 

0.602 

0.316 

0.610 

0.296 

0.576 

30 

0.792 

0.764 

0.728 

0.708 

0.377 

0.757 

0.335 

0.692 

50 

0.782 

0.758 

0.731 

0.723 

0.360 

0.730 

0.320 

0.707 

70 

0.771 

0.755 

0.728 

0.720 

0.365 

0.747 

0.318 

0.704 

90 

0.757 

0.733 

0.735 

0.708 

0.363 

0.736 

0.332 

0.692 

120 

0.730 

0.705 

0.707 

0.700 

0.366 

0.714 

0.309 

0.683 

150 

0.740 

0.727 

0.675 

0.674 

0.370 

0.729 

0.304 

0.660 


Compared with the existing hashing methods: In this section, we de¬ 
sign an improved version of STHs, denoted as STHs-Tag, by replacing the orig¬ 
inal construction of similarity matrix with the proposed method described in 
Section EPl We remove 60 percent tags randomly from the training dataset to 
verify the robustness for HMTT-Fea, HMTT-Dec, STHs and STHs-Tag. The 
precision-recall curves for retrieved examples are reported in Fig. [2j From these 
comparison results, we can see that HMTT-Fea and HMTT-Dec significantly 
outperform other baseline methods on Search Snippets as shown in Fig. [2] (a). 
For 20Newsgroups, HMTT-Dec performs close results with STHs-Tag in Fig. [2] 
(b). The reasons to explain this problem are that: Firstly, 20Newsgroups as a nor¬ 
mal dataset has sufficient original features to learn hash codes so that STHs-Tag 
based on keyword features works well. Secondly, we directly learn the topic mod¬ 
els of 20Newsgroups from the training dataset that result in some restrictions. 
Furthermore, STHs get a worse performance than STHs-Tag on two datasets. 
Because STHs uses a complete supervised approach which only utilizes the pair¬ 
wise similarity of the documents with common tags, that method cannot well 
deal with the situations that tags are missing or incomplete. In our approach, we 
extract the optimal multi-granularity topics depending on the type of dataset to 
learn hash codes and hashing function, and the tags are just utilized to adjust 
the similarity, which has stronger robustness. In the following experiment sets, 
we keep the all tags to improve the performance of hashing learning. 

Compared with single-granularity and other multi-granularity topic 
sets: Here, the hashing performances of the optimal multi-granularity topics are 








compared with single-granularity and other multi-granularity topics. We further 
evaluate the balance values of the multi-granularity topics by fixing them to f. In 
particular, we keep the parameters /I., in Eq. [3] and in Eq. [7] to I for HMTT- 
Fea and HMTT-Dec respectively. The quantitative results on Search Snippets 
are reported in Table |T| From the results, we can see that the performances 
of multi-granularity topics significantly outperform single-granularity topics and 
the optimal multi-granularity topics achieve a better performance in most situ¬ 
ations. We also observe similar results on 20Newsgroups. But due to the limit of 
space, we select to present the results on the typical short texts dataset Search 
Snippets. 

Compared between the proposed two strategies: Finally, we mainly 
discuss the performances between the proposed two strategies, HMTT-Fea and 
HMTT-Dec. In HMTT-Fea, we directly concatenate the multi-granularity topics 
to produce one feature vector and decompose the hashing learning problem into 
two separate stages. In HMTT-Dec, the multi-granularity topics extracted from 
the text content are treated as multi-view features, and we simultaneously learn 
the hash codes as well as hash function. From the results in Table |T] we can see 
that the performances of HMTT-Fea surpass HMTT-Dec on several evaluation 
metrics. Obviously, the former strategy is more simple and effective for short 
text hashing in our approach. In summary, no matter in HMTT-Fea or HMTT- 
Dec, the experimental results indicate that short text hashing can be improved 
by integrating multi-granularity topics. 


5 Discussions and Conclusions 

Short text hashing is a challenging problem due to the sparseness of text rep¬ 
resentation. In order to address this challenge, tags and latent topics should be 
fully and properly utilized to improve hashing learning. Furthermore, it is better 
to estimate the topic models from an external large-scale corpus and the opti¬ 
mal topics should be selected depending on the type of dataset. This paper uses 
a simple and effective selection methods based on symmetric KL-divergence of 
topic distributions, we think that there are many other selection methods worthy 
of being explored further. Another key issue worthy of research is how to inte¬ 
grate the multi-granularity topics effectively. In this paper, we propose a novel 
unified hashing approach for short text retrieval. In particular, the optimal multi¬ 
granularity topics are chosen depending on the type of dataset. We then use the 
optimal multi-granularity topics to learn hash codes and hashing function on two 
distinct ways, meanwhile, tags are utilized to enhance the semantic similarity 
of related texts. Extensive experiments demonstrate that the proposed method 
can perform better than the competitive methods on two public datasets. 
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